RecursiveMAS cuts multi-agent latency by using embeddings only, delivering 2.4x speed and 75% fewer tokens

Published on 16 May 2026

Agents “talk” through embeddings, not intermediate text—fast and cheaper

Researchers from UIUC and Stanford propose RecursiveMAS, a multi-agent framework that replaces text-to-text communication with latent embedding passing. Instead of generating reasoning tokens at every step, agents loop continuous representations through RecursiveLink modules and only output text at the end. Tests across nine benchmarks show up to 2.4x faster inference, 75% token reduction by round three, and an 8.3% accuracy gain, with far cheaper training than full fine-tuning.

Average accuracy rises 8.3% over the strongest baselines
End-to-end inference speeds up 1.2x to 2.4x by avoiding stepwise text
Token usage drops 75.6% by recursion round three versus Recursive-TextMAS
Training updates only ~13M RecursiveLink parameters (about 0.31% of trainable size)

#tokens #embeddings #llm inference #multi-agent ai #research

Read the full story at Venture Beat

This summarization was done by Beige for a story published on Venture Beat

RecursiveMAS cuts multi-agent latency by using embeddings only, delivering 2.4x speed and 75% fewer tokens

The full experience is on mobile.